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SIGNIFICANCE  AND  EXPLANATION 


Peal  data  sets  almost  always  contain  outlying  (extreme)  observations  and 
outliers  are  particularly  damaging  in  on  line  control  situations  in  which  the 
data  is  processed  recursively.  Thus,  an  extremely  bad  value  can  distort  the 
whole  mechanism  of  control  and  make  the  process  very  unstable. 

In  this  paper,  we  offer "a  relatively  simple  model  and  obtain  a  procedure 
to  deal  with  the  above  problem.  To  represent  the  appearance  of  bad  observa¬ 
tions,  a  scale  contaminated  normal  distribution  has  been  assumed  for  the 
measurement  error. 

In  fact,  we  have  shown  in  this  paper,  how  a  Bayesian  approach  allows  the 
development  of  a  simple  recursive  estimation  algorithm  that  has  the  desired 
property  of  "filtering”  bad  (i.e.,  extreme)  observations.  Indeed,  extreme 
values  are  downweighted  by  their  posterior  probability  of  being  spurious,  and 
the  estimates  of  parameters  are  updated,  recursively,  accordingly. 

Finally,  we  apply  our  model  to  the  case  of  exponential  smoothing  with 
contaminated  error,  and  show  that  the  parameter  estimates  obtained  from  the 
resulting  algorithm  are  a  weighted  combination  of  certain  2n  smoothing 
schemes.  The  application  of  the  procedure  to  a  broad  range  of  statistical 
estimation  problems  is  briefly  discussed. 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC ,  and  not  with  the  authors  of  this  report. 


ROBUST  KALMAN  FILTERING  AND  ITS  APPLICATIONS 


Irwin  Guttaan  and  Danial  Pana 


1.  Introduction  and  Summary 


Kalman  (1960)  introducad  a  method  of  updating  knowledge  about  the  "state"  of  process 
parameters,  say  £,  at  time  t,  using  a  least  squares  procedure.  This  method,  now  known 
as  Kalman  filtering,  has  wide  applicability,  froa  on-line  process  control  in  industry  to 
applications  in  econosd.cs.  Kalman's  results  are  reproducible  using  a  Bayes  approach  with 
normal  theory,  conditional  on  known  values  of  variances  and  co-variances  involved. 

One  aspect  of  the  filtering  process  is  that  it  is  sensitive  to  extreme  observations. 
Indeed,  one  or  aore  wild  observations  can  make  the  Kalman  filter  unstable.  This  a  well 
recognised  result  in  both  the  Statistical  and  Engineering  literatures,  and  is  discussed  in 
the  use  of  a  Kalman-type  filtering  schema  that  takes  into  account  the  possibility  of 
spuriously  generated  observations  giving  rise  to  extreM  observations.  This  filtering 
scheme  automatically  examines  the  possibility  that  the  current  observation  is  spurious,  and 
if  the  evidence  points  to  this,  downweights  that  observation  in  the  filter,  and  does  the 
opposite  for  seeadngly  "good"  (i.e.,  non  spurious)  observations. 

He  develop  this  filter  in  Section  3,  after  reviewing  the  standard  Kalman  filter  in 
Section  2.  Section  4  sketches  a  number  of  applications,  and  the  use  of  our  filter  in  these 
areas.  Finally,  Section  5  provides  sosm  discussion  of  our  results. 
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2.  The  standard  Kalman  Filter 


The  results  due  to  Xalman  (1960)  may  be  derived  and  approached  from  the  Bayesian  point 
of  view.  Suppose  that  we  observe,  at  time  t,  a  response  vector  say  of  order 

(pxl),  and  that  this  random  response  vector  is  such  that 

^  "  At£t  ♦  £y.  £y  ~  ^<2.0  <2‘1> 

where  At  is  a  (pxr)  matrix  of  known  coefficients,  ^  is  a  (rxl)  vector  of  unknown 
process  parasieters,  and  C  is  a  (p*p)  positive  definite  matrix,  assumed  known.  The 
vector  ^  is  referred  to  as  the  current  (i.e.,  at  time  t)  state  of  the  process  para¬ 
meters  8.  As  t  varies,  the  states  are  also  assumed  to  have  a  linear  structure,  vis,  for 


given  Vi'  V, 


St  '  QtVl  +  V  £e  ~  Nefl(2'V) 


where  Bt  is  a  r*r  known  matrix,  and  V  is  a  (rxr)  positive  definite  matrix . 

Finally,  it  is  also  assumed  that  we  have  prior  information  about  ^ ,  given  ■ 

y. . . •  .  This  assumption  is  somatlmss  referred  to  as  the  Inductive  Hypothesis,  and  says 
that  prior  to  observing  x*'  “d  9iv>n  ^  ,Xt_2 #  •  •  •  *  that  the  distribution  of  , 

given  ^ , . . .  has  structure 

Vi  -  V,  ♦  V  •  V  -  (2*3> 

**t"  • 

where  is  a  (rxr)  positive  definite  matrix.  He  shall  see  below  that  Ut-i  • 

function  of  the  y’s. 

Now  using  the  above  assumptions,  we  may  rapidly  deduce  that  the  prior  for  8^,  given 


'Ht.t-l^t.t-l’ 


l^tit-i  "  at  V»l 


V  ■  V  +  O  V  Q'  . 

t«t-1  t  t-1  t 


(2.4a) 


(The  subscript  "tit-1*  refers  to  the  fact  that  we  are  at  time  t  and  have  observed 
yt_ 1 . )  The  quick  and  easy  way  to  see  (2.4)  -  (2.4a),  is  as  followsi  He  have,  for  given 


*t  ■  at  St-1  +  Se 

~ t 


from  (2.2).  But  8^^  is  itself  random,  so  that  using  (2.3)  in  (2.5),  we  have 

St  “  °t^t-i  1  ♦  V 

~t-i  ~t 

(2.6) 

"  at  ^t-i  *  °t  *e.  .  *  £e  • 

~t- 1  -~t 

Because  of  the  namality  assumptions,  and  assuming  independence  of  @t_  1 ,  9^ ,  we  have  that: 


St~v(0twatVi°;*v»  . 

'*T 

as  claimed  in  (2.4)  -  (2.4a).  It  is  convenient  to  write  the  result  (2.4)  or  (2.7)  as 

St  “  ^ttt-1  *  ^tit-1  '  &t:t-1  ~  N(2'vt:t-1>  '  (2 

He  may  now  also  deduce,  using  assiasptions  (2.1),  (2.2)  and  (2.3),  with  the  result 

(2.4),  the  predictive  distribution  of  the  yet  unobserved  given  ^  For  from 

(2.1),  we  have 

*t  ’  AtSt  +  *y  (2 

where  ~  N(0,C),  and  from  (2.8),  given  we  have  that 

*t  "  *  Stjt-t*  + 


“  At  *t,t-1  +  At  *t,t-1  *  *y 
which  implies,  for  given  y  ^ ,  that 

*t  ~  N(At  Wv  Atvt.t-iAt  *  c)  * 

He  let  the  predictive  variance  of  (2.11)  be  denoted  by  that  is, 

«t  *  C  ♦  . 


(2.10) 


(2.11) 


(2.11a) 


How  the  results  (2.7)  and  (2.11)  give  the  distributions  of  and  x^  before  observing 

Xt«  namely  p(gt|yt_1 )  and  p(ytlxt_1).  respectively.  Now  when  we  observe  j^,  we  are 
in  the  position  to  deduce  the  posterior  for  8^,  given  xt •  tor  *re  in  the  position  of 
having  (2.1)  as  the  sampling  distribution  of  given  9^,  and  (2.4)  as  the  prior  for 

6^,  (given  etc).  He  remark  that  once  this  posterior  is  obtained,  it  plays  the  role 

of  (2.3)  for  the  next  stage,  to  be  discussed  below.  Now  using  Bayes'  Theorem  with  (2.1) 
and  (2.4)  as  the  necessary  ingredients  yields:  Given 


::  -  :  -  ■  :-'v  :y-:; 
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*t~N<*t:t  '  Vt,t> 


(2.12) 


*t,t  ‘  +  Vt,t-1AlMt  <*t  "  At  *t«t-1] 


(2.12a) 


*tc' *t  *  v^t-1 


so  that,  as  is  easily  verified. 


vt.t  -  vt,t-1  “  Vtit-I^  *tvt,t-1 


(2.12b) 


<2.12c) 


where,  we  recall  that  is  given  at  (2.11a). 

The  derivation  of  these  results  is  given  in  Appendix  Z.  Notice  the  updating  pattern 
frost  the  prior  mean,  to  contained  in  (2.12a).  Indeed,  (2.12a)  implies 

that  the  current  information  about  the  process  parameters  6^,  given  ^ ,  is  the 

prior  information  of  6^  given  ^  ^  plus  an  updating  term,  obtained  by 

“filtering”  the  deviation  of  ^  froa  ita  predictive  expectation  -  see  (2.11)  -  by  use  of 


the  matrix 


CTt  ■  Vt-t-lA'M;1  . 


(2.13) 


Indeed  the  matrix  KFt  of  (2.13)  is  referred  to  as  the  Kalman  Gain  matrix,  and  we  note 
that  an  alternative  form  -  see  Appendix  I  -  is 

»t  "  vt«tA'C_1  • 

Note  too  the  update  of  Vt|t  contained  in  (2. 12b)  -  for  example,  we  update  the 

precision  of  the  prior  of  J^,  given  by  adding  the  precision  A^T’a*  of  the 

regression  process  parameter  of  (2.1)  to  obtain  v£jt,  etc.  To  enter  the  next  stage 

we  replace  (2.3)  with  (2.12)  by  setting  and  vt«t  -  V  and  make  the  obvious 

modifications  in  (2.1)  and  (2.2),  and  repeat  the  process. 

To  start  the  Kalman  filter,  prior  conditions  for  the  state  vector  £  must  be  made, 
say  u0  for  its  expectation,  and  the  matrix  V0  for  its  variance  -  covariance  structure. 
Once  declared,  the  estimate  for  6^  the  state  of  nature  for  the  process  that  yields  y1 , 


®1i0  "  ^l  ^0  "  ^1j0 


(2.14) 


and  its  variance  -  covariance  matrix  is 


(2.14*> 


V1,0  ’  “iVV  +  v  • 


The  forecast  of  Che  new  observation 


*1 


(see  (2.11))  is 


l1  11:0 

with  variance  -  covariance  (see  (2.11)  -  (2.11a)) 


(2.15) 


*1  *  C  ♦ 


(2.15a) 

When  is  actually  observed,  we  update  as  follows  -  the  expected  state  of  nature  goes 

froB  *1:0  to  JBl*  1*  whare 

^1:1  "  ^1:0  +  V1:0Al’M11(^1  ”  *1^1:0* 
and  the  affiliated  variance  -  covariance  is  updated  to  s 1 ,  where 

v;ll  -  -  (V1i0  -  V^’M^V^)-1  . 

As  indicated  befoe,  we  now  set 

J4l  “  li1f1  V,  -  V1t1 

and  inquire  about  82,  prior  and  posterior  to  seeing  ^ ,  etc.  .  We  have 

*2  "  A2  ~2  *  V  ~y  ~  N(2'c>  » 


(2.16) 


(2.16a) 


(2.17) 


(2.18) 


with 


while 


®2  “  °2  Si  ♦  £*’  &e  ~  S(2'V) 


(2.18a) 


a,  -  ~  N(£'V1)  •  (2.18b) 

The  last  two  statements  may  be  combined  so  that  we  have,  given  ^ , 

®2  ~  N(a2  V  a2V1°2  *  V) 

or  (2.18c) 

~  N(U28i  *  v2:1>  * 

The  distribution  (2.18)  is  now  used  with  (2.18c)  to  find  the  posterior  of  S2,  given  £2, 
which  is 


S2~N<*2,2  '  V2,2>  <2-19 

where  H2.2  9iV8n  by  (2.12a)  with  t  ■  2  and  V2s2  determined  by  (2.12b),  etc.  , 
and  the  loop  continues  in  the  same  way  for  t  “  3,4,..  .  This  is  pictorallzed  in  Figure 


2.1  for  the  general  case  of  having  reach  state  t  -  1  and  observed  , 


etc.  . 


There  are  several  comments  to  be  made  about  the  foregoing  Kalman  filter.  Essentially, 
the  whole  process  Is  a  least  squares  procedure,  as  the  reader  will  not  doubt  have  known  or 
guessed.  Least  squares  estimates  are  well  known  to  be  non-robust  to  outlylnq  observations 
(see  Andrews  et  al  (1972)),  and  In  the  Kalman  filter  case  this  could  make  the  whole 
procedure  unstable,  with  devastating  consequences  in  some  situations,  such  as  line-process 
control  of  mass  produced  items.  Then  too,  when  p  >  3,  it  is  well  known  that  least 
squares  estimators  are  not  admissible  (see  Stein  (1956)). 

Finally,  the  assumption  (2.1)  that  implies  that  £y'8  come  from  the  same  distribution 
is  much  too  strong  in  practice.  Much  evidence  exists  that  shows  that  sets  of  data  almost 
always  contain  a  small  proportion  of  observations  that  have  been  spuriously  generated 
(i.e.,  not  in  the  manner  intended)  giving  rise  to  extreme  or  outlying  observations  (see  the 
general  discussion  in  the  paper  by  Box  and  Tiao  (1968)  and  Guttman  (1973)).  For  these 
reasons,  we  replace  the  assumption  (2.1)  by  a  more  realistic  sampling  model,  and 
investigate  what  form  the  ensuing  "Kalman  filtering"  process  will  take  in  the  following 
section . 
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3.  A  Robust  Algorithm 


3.1.  A  Different  Sampling  Model. 

As  indicated  in  the  previous  section,  the  oft-made  assumption  (2.1)  is  highly  suspect, 
and  skepticism  about  this  assumption  often  points  to  the  question  of  the  possible  effect  of 
outliers  on  procedures  developed  using  (2.1)  in  general,  and  for  us,  in  particular  on 
Kalman  Filtering.  Outliers  are  feared  mainly  due  to  the  fact  that  they  may  have  been 
generated  spuriously,  thus  biasing  results.  This  is  a  well  recognized  concern  in  the 
engineering  literature,  and  for  example,  the  papers  of  Alspach  and  Sorenson  (1971), 
Masreliez  (197S),  Masreliez  and  Martin  (1977)  and  Tsai  and  Kurz  (1983)  use  a  different 
approach  then  ours  to  meet  this  problem.  Of  course,  the  problem  of  how  to  deal  with 
outliers  in  other  situations  and  the  general  problem  of  " robustness"  of  various  procedures 
is  the  focus  of  much  of  the  current  Statistical  literature  -  see  the  references  cited  in 
Section  2,  for  example. 

Because  of  the  above  general  concern,  it  is  desired  to  establish  procedures  that  are 
robust  to  outliers  in  that  they  accomodate  the  appearance  of  abberant  observations 
appropriately  -  roughly  speaking,  giving  small  weight  to  observations  that  seem  spuriously 
generated,  and  large  weight  to  seemingly  "good*  observations. 

A  spur iousneBs  that  gives  rise  to  outliers  often  means  that  the  error  distributions 
involved  have  tails  heavier  than  those  of  the  normal  distribution,  we  will  generalize  below 
the  method  of  accomodating  outliers  used  by  Box  and  Tiao  (1968),  and  replace  the  assumption 
(2.1)  by  the  so-called  Scaled-Contaminated  Model  (SCM).  This  model  was  introduced  into 
statistical  practice  by  Jeffreys  (1961)  and  has  been  used  by  Box  and  Tiao  (1968)  to 
robustify  estimation  in  the  standard  linear  model,  by  Abraham  and  Box  (1979)  to  accomodate 
outliers  in  time  series,  etc.  Indeed,  Cheng  and  Box  (1980),  have  shown  that  the  SCM  model 
represents  a  sensible  modeling  in  many  situations  where  spuriousness  is  feared. 


ttie  SCM  model,  simply  stated  is  that 


At  fit  +  *y 


c  -  c^ms, C,)  +  a2N(2,c2) 


.  \--V.  .--V-  .W 


a 


In  (3.1),  we  assume  that  the  known  constants  a.,,  a2  are  such  that  a2  “  1-a1 ,  and  that 
a2  e(0,.15),  as  is  comon  in  most  applications.  Further,  we  also  assume  and  C2  are 

known,  and  such  that  by  any  measure,  C2  is  larger  than  C^.  For  example,  it  could  be 


that  (for  0^  a  known  scalar). 


C,  -  o2 I  ,  C2  -  k2o2I,  k2  >  1  .  (3.2) 

The  prescription  (3.1)  says  that  with  small  probability  a2 ,  ^  is  generated  spuriously 
from  N( A  8  ,  C_),  etc.  . 

In  addition  to  the  assumption  (3.1),  we  also,,  in  this  section,  make  the  assumptions 
(2.2)  and  (2.3)  of  the  previous  section,  and  inquire  into  the  question  of  how  the  3 
assumptions  (3.1),  (2.2)  and  (2.3)  affect  the  updating  procedure  discussed  in  the 
introductory  section.  For  convenience  we  list  the  assumptions  used  in  this  section  at  the 
point: 


(i) 

see 

(3.1)  -  (3.2) 

(ii) 

St  * 

at  St-i  +  £e 

(iii) 

St-1 

‘  Vi  +  V 

~t- 

The  assumptions  (ii)  and  (iii)  of  course,  give  rise  to  the  result  (2.4),  namely  that 
9t,  given  has  distribution  (the  prior  of  §^)  (2.4),  vis 

V*t«t-rvt.t-i)  l3*4) 

where 

Ut-t-i  ”  nt  Vi'  vt-t-1  ”  V  +  atVt-1flt’  *  (3.4a) 

We  can  now  determine  the  predictive  distribution  h( • |yL_  ^ ) .  say,  of  ,  given 

Formally,  this  is  defined  as 

^ 8  f(Xtlfit)p(~tl*t-1>d~t  (3-5) 

~t 

and  here,  f  is  dictated  by  (3.1),  while  p  is  obtained  from  (2.4).  The  result  of  doing 
the  integration  (3.5),  as  proved  in  Appendix  II,  is  as  follows: 

MxJXt-l’  ~  a1NXt(At  *t.t-1'Mt,1>  +  “2\Ut  Vt-1'Mt,2>  (3-6) 


I 
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Mt,i  -  ci  +  *  l-*,2 


(3.6a) 


We  note  that 


S<Xtlxt_i>  “  At  ^t>t-1'  V(*tlJ£t-1)  “  AtVt«t-1At  +  ^  ajCJ 


He  note  that  we  may  write 


?<3£tlxt_i>  "  “tMt,1  +  “2*1,2 


(3.6b) 


(3.6c) 


3.2.  The  resulting  recursive  algorithm. 

The  results  (3.4)  and  (3.6)  are  statements  that  can  be  made  before  seeing  Once 

Xt  is  observed,  we  are  in  the  position  of  being  able  to  find  the  (posterior)  distribtuion 
of  8^,  given  For  the  prior  of  8^ ,  we  have  the  distribution  specified  by  (3.4), 

while  the  (sampling)  distribution  of  8  is  given  by  (3.1).  The  result  of  using  Bayes' 
Theorem  with  these  ingredients ,  as  proved  in  Appendix  XI,  is  that  the  posterior  of  9^, 
given  ^  is  such  that 

2 

./A  I..  I  ^  V  «  u  f..^^  tv 


**tl*t>  ~  l  •  vt,t 

i*  1 


^Jt  “  ^it-i  +  vtJt-iAtMt,i<xt  "  ht  ^it-i* 

-  (V-1  *  A^C~1A  )_1 

tit  1  tJt-1  t  i  V 


V.  .  ,  -  V.  .  ,A^M~  V  AV 
tit-1  t«t-1  t  t,i  t  t»t-1 


(3.7a) 


(3.7b) 


and  where 


ait(*tl*t  at»t-1,Mt.i) 


t,i  2 


J-  ®jf(*tlAt  J4tit-1'Mt,J) 

j- 1  ' 


(3.7c) 


with  f  denoting  the  density  of  the  Normal  multivariate  distribution,  so  that,  in  general 


f<xla«*>  "  (2-r*,/2|MrV2.xp  -  • 


(3.7d) 


The  reader  will  not  doubt  recognise  that  the  denoadnator  of  (3.7c)  is,  on  using  (3.7d),  the 
predictive  density  h  of  x^»  given  Xt_^»  stated  in  (3.6).  Indeed,  using  (3.7d)  in 
(3.7c)  yields 
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\  V 


I 


i 


i 


I 

i 

! 

j 


r  ®2  i  i-1 

\.i  -  1  *  «7  1-Tm~7T^  expt2  (*t  -  \  •  Mt,2)  (VAt 

L  1  fc'  J  (3.7e> 


°t,2  *  1  ‘  at,1  ’ 


From  (3.7)  we  easily  find  (see  Appendix  n  for  proofs) 

.1.  -1  .  _  „-1 


E(StIXt)  ■  *t,t-1  +  Vt,t-lV°t,1Mt,1  +  at,2M"t,21(*t  •  At  JBt.t-1* 


^:t 


(3.8) 


'  Vt,t-1AtBtAtVt,t-1  *  V 


tit 


where 


“t,1Mt,1  *  “t,2Mt,2 


-1 


"t,1"t,2’"t,1  "t,2)(*t 


-  M. 


■^tit-l 


(3.8s) 

<Xt  -  *t  ‘  VV  ' 

The  quantities  <*t  ^  and  <»t  2  -  ( 1-at( i )  are  the  posterior  probabilities  that  jyt  has 
cone  fro*  the  intended  source  (i.e.,  M(A  8  ,C  ))  and  the  spurious  source  (i.e., 

t  JnJw  1 

N(At  8  c2».  respectively.  The  posterior  expectation  is  sade  up  of  two  parts  (c.f.  with 

(2. 12a)) i  the  prior  expectation  of  8^,  given  (see  (3.4))  and  a  deviation 

of  from  its  (marginal)  predictive  expectation  At  (see  (3.6b)),  but  this  time 

the  filtering  (gain)  matrix  is  the  weighted  sum  of  two  Kalman  gain  matrices,  where  the 
weights  are  the  estimates  a  ,  just  commented  on.  Put  another  way,  the  gain  matrix 
involves  the  weighted  sum  of  the  predictive  precisions  that  would  be  involved  if  sampling 
was  from  either  N(At  , C ^ )  or  H(At  f^.Cj),  with  weights  that  are  the  probabilities 

that  was  so  sampled.  For  more  introspection  about  the  updated  variance  -  covariance 

Vtjt,  we  first  invite  the  reader  to  inspect  the  updated  ' s ,  and  to  acquaint 

themselves  with  the  details  of  how  these  are  used  by  reading  the  proof  of  (3.8a)  in 
Appendix  II. 

He  remark  that  to  continue  this  procedure,  we  now  use  the  following  scheme 
Set  j ^  (see  (3.8))  and  -  Vfc|t  (see  (3.8a))  .  (3.9) 
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■  '  -  *  ■<  "l 
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e  *  ■  «  .  •  .  e  .  m  ■  .  •*  „  w  v  ,  .  .  *  «  *  m  Ja  b  >  *  *  »#»  **»*s  »  e**m*-w*  »  ■  V  m  W  • 

•  ’  »  ‘  VV  s^Js  *  e^  .  “•  .*«  *e  ^  •»  ’  -*  V’  %,*  **  *  •.’*"*  '  »  •  -  .  •  ■Tm  1 


We  My  now  enter  the  next  stage  -  we  replace  (lii)  of  (3.3)  with: 
Given  Jtt*  £*  -  +  £g  ~  N(J8»Vt> 


(3.11) 


and  unking  the  obvious  modifications  in  (3.1)  and  (ii)  of  (3.3)  -  t  is  replaced  by  (t+1) 
-  we  repeat  the  above  process. 

As  in  the  previous  section,  to  start  this  Kalman  Filter,  we  must  state  prior 
conditions  for  the  state  vector  £,  say  jig  for  its  expectation,  and  the  matrix  V0  for 
its  variance  -  covariance  structure,  cnee  these  are  declared,  then  using  (3.4)  we  have 


that  the  estimate  for  before  seeing  y  1  is 

£l,0  "  °1  *0  “*1,0 

and  the  associated  variance  -  covariance  Mtrix  is 


(3.12) 


The  forecast  of  the  new  observation 


*1:0  - v  ♦  aivr  • 

v.  is  (see  (1.5b)) 


(3.12a) 


*1  “  A1  *1:0  "  A1°1  *0 
with  associated  variance  -  covariance  Mtrix  (see  (3.6b)) 


(3.13) 


l  ajci  +  Aivi«oAi 

j«1 


(3.13a) 


(c.f.  with  (2.15a)). 


When  y 1  is  actually  observed,  we  update  as  follows  -  the  expected  state  of  nature 
goes  from  y1-0  to  y^,  where  (see  (3.8)) 


*1,1  -  *1.0  +  V1:0A;l«t,1Mi!l  *  “t,2Ml!2I(Xl  -  A1  *1,0’ 

where  the  matrices  are  defined  in  (3.6a),  and  where  the  ±  ere  given  by 

(3.7c).  The  associated  variance  -  covariance  is  (see  (3.8a)) 


(3.14) 


v1,1  *  v1,0  "  V1:0A1B1A1V1:0 


(3.14a) 


8,  “  <*.  .m”1.  o. 

1  1,1  1,1  1,2  1,2 


"  ai,1°1,2<M1,1  “  M1,2)(*1  "  A1  *1.0,(*1  "  A1  *1*0>,<M1,1  -  "l^* 

As  mentioned  in  this  section,  we  now  set 


(3.14b) 


li,  -  *,,,  »"<*  V,  -  V1s1 


(3.15) 


and  take,  for  tha  diatribution  of  8^,  given  y  1 ,  ptS.,!;^)  given  by 

<3.1« 

that  is,  9  ^  ~  N(iiliV1),  given  y1 .  Thia  may  be  combined  with  (ii)  of  (3.3)  for  t  «*  2 


to  yield 


P‘S2IX1>  -  f<S2lM2s1'V2*1» 


(3.17) 


H2s  1  “  n2  >il  '  v2t1  “  v  +  Q2V1°2  •  (3.17a) 

Now  the  distribution  of  %2 ,  given  82,  is  of  course  specified  by  (3.1),  that  is 

P<X2|62,Cl'C2,a1'a2>  “  fltif(X2lA2  W  +  a2f<X2lA2  ~2,C2)  <3-18> 

and  (3.18)  My  be  contained  with  (3.17)  to  produce  the  posterior  of  8^ ,  given  y2  which 


Pl£2l*2>  “  CI2,ift~2^2i2'V2:2) 


(3.19) 


where  and  V^2  are  found  from  (3.7a)  -  (3.7b).  The  probability  a2  1  is  of 

course  (see  (3.7c)) 


u1t(X2\A2  *2;  1'W2.  t* 

2,1  2 

A.  °if(*2lA2  J42!l'M2,i) 


(3.20) 


and  a2  2  -  1-a2  ^  and  the  loop  continues  in  this  way  for  t  •  3,4,...  etc.  This  is 


pictoralised  in  Figure  3.1  for  the  general  case. 


*  v 


->  y-  ■  . 

.*W  vN. f  -a 


Figure  3.1 
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The  previous  section  develops  a  filter  which  Bay  be  used  in  Many  various  situations. 
To  illustrate  this,  we  discuss  several  instances  where  the  new  filter  of  Section  3  stay  be 
applied. 


4.1.  Bayesian  Forecasting 

One  of  the  most  well  known  applications  of  the  Kalman  filter  in  Statistics  is  to  that 
of  Bayesian  forecasting,  as  developed  by  Harrison  and  Stevens  (1976).  These  authors  used 
the  standard  formulation  developed  in  the  engineering  literature  for  the  linear  control 
problem  that  w*  have  presented  in  our  Section  2,  calling  it  the  dynamic  linear  model.  They 
showed  that  most  statistical  models  can  be  considered  under  this  general  framework ,  which 
thus  allows  an  unified  approach,  namely,  by  using  the  Kalman  filter  for  recursive 
estimation  of  the  parameters.  Harrison  and  Stevens  (1976)  then  applied  these  idea*  to 
Bayesian  forecasting. 

As  our  robust  filtering  includes  the  standard  algorithm  as  a  particular  case,  the 
robust  version  of  the  algorithm  we  have  developed  can  be  applied  to  any  of  the  forecasting 
models  discussed  by  these  authors. 

To  illustrate  the  behavior  of  the  algorithm  we  shall  discuss  its  applications  to  one 
of  the  most  widely  used  model  for  forecasting,  the  so-called  Steady  Model.  This  model  is  a 
particular  case  of  the  general  formulation  with  A^  -  1,  -  1  and  with  yt  a  scalar. 

Then,  -  Dt_.j  and  Vt , t_ ,  -  V  ♦  Vt_|.  The  updating  equations  for  the  standard 

recursive  estimation  of  this  model  are 

V+V 

wt  “  “t-i  *  (c+vVv--)(rt  *  Mt-i‘  (4’1) 


c(v+v  ) 

v  «  -  -- 

t  c+v*vt_ 


(4.1a) 


After  some  iteration  the  system  will  reach  a  stable  state  in  which  Vt  -  Vfc_ ,  -  a.  Calling 
6  »  C ( c-t-v+a]  \  we  see  that  (4.1)  can  be  written  as 

Ut  -  8  Ut.,  ♦  (1-9)yt  (4.2) 
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and  applying  it  racuraivaly  and  aasuaing  Hq  •  0,  ve  obtain 

t-1  * 

U.  -  I  6  (1-0)y  .  (4.3 

*  i-1 

Me  have,  than,  fro*  (4.3)  tha  rasult  that  tha  ataady  model  is  aquivalant  to  tha  IMA  (1,1) 


nodal ,  which  is  aquivalant  to  exponential  smoothing. 

To  apply  tha  robust  Kalman  filtering  algorithn  wa  hava  davalopad,  wa  assuswd 
2  2  2 

C1  “  0  '  c2  “  *  0  *  th#n 

a  a 

it  -  .  ♦  (V  +  v  )[— — — — — -2 - J  (y  -  v  .) 

fc  t_1  t_1  o2+v+vt>>1  k2o2+vtvt_1  *  * 

and  assuming  as  bafora  a  stabla  stata  with  Vt  “  Vt_^  •  a,  than  calling 

0^  ■  o2(v2  +  v  ♦  a)”1  and  @2  “  (k2o2)(k202  ♦  v  ♦  a)  \  wa  hava 

“t  ■  Iat,iei  4  “t^V^-i  4  m-Vat,i  *  (1-e2>at,2lyt  • 

This  expression  shows  that  wt  is  obtainad  as  a  wsightad  combination  of  Mt.1  and 
yt,  as  in  tha  standard  caaa,  but  now  tha  waights  ara  changing  in  ovary  itaration  and 
depand  on  tha  probabilitiaa  f  and  at  2-  Sotting  0t  -  «t1e1  ♦  <*t202*  we  can  writs 

1»t  “  *^-1  +  *  1-®t^yt  <4,4 

and  now  tha  axprassion  of  as  a  function  of  tha  obaarvations  is 

n  (1-0  )  i 

Mt-  1  -S-—  yt.t(  »  «  )  .  (4.5 

i-0  t-i  J-O  3 

To  intarprot  this  aquation  lot  ua  danota  by  w(  r si , j ,k, . .h)  tha  probability  that  r 
observations  caaa  f roe  tha  "spurious”  or  "bad”  population  at  times  i,  J,  k,..h,  where 
there  are  r  symbols  i,  j,..,h.  Than 

n 

w(rii,J|k,..h)  ■  (  D  <*t  ^  )ai2fllJ2ak2 ' "  ,ah2  (d.0 

t^i , j, • • ,h 

and  let  us  call  |i  (rii,j,k,..h)  tha  smoothing  associated  with  this  combination  of 
observations  (see  (4.3)).  This  smoothing  is  tha  result  of  assusiing 

wt  ■  Vt-1 ♦  n-ei,yt  (4-7 

for  observations  yt,  t  ”  1,...  n  but  t  ^  i,j,k,..h,  and 

Wt  -  ♦  M-02)yt  (4.7s 

for  t  -  i,j,k,..h.  Then,  it  is  straightforward  to  show  that 


A 

Ut  -  Ew(rti,j,..h)M(r>i,j,..h)  (4.8) 

where  her*  the  luncion  is  over  sll  the  2C  possible  combinations  of  observations  chosen 
from  y.,...,yt.  The  Bain  advantage  of  our  algorithm  (4.5)  is  that  the  2fc  exponential 
smoothing  factors  are  the  result  of  the  recursive  relationship  (4.4)  and  are  not  computed 
separately  but  globally  and  here  in  a  very  direct  way. 


4.2.  t>o bust  Linear  Regression 

As  mentioned  previously.  Box  and  Tiao  (1968)  assumed  a  scale  contaminated  normal 
distribution  for  the  noise  of  a  regression  model  and  used  Bayesian  estimation  methods  to 
obtain  a  "robust"  estimation  procedure  that  downweights  suspicious  observations  in  the 
linear  model .  Additionally,  Chen  and  Box  (1979)  showed  that,  given  appropriate  values  to 
the  parameters  of  the  noise  distributions,  the  Box-Tiao  weights  can  reproduce  functions  for 
downweighting  residuals  using  M-estlmators  that  have  been  proposed  on  empirical  grounds  by 
Andrews  et  al  (1972).  Although  this  approach  provides  a  general  way  to  deal  with  outliers 
in  the  linear  model,  the  computations  needed  are  cumbersome,  because  the  posterior 
distribution  of  the  parameter  vector  is  a  weighted  average  of  2n  posterior  distributions. 
Box  and  Tiao  (1968)  suggested  that  one  need  compute  only  the  first  few  leading  terms,  but 
with  a  large  set  of  data  the  confutations  are  still  heavy,  and  there  are  no  clear  rules 
about  how  many  terms  we  would  need  to  obtain  a  proper  approximation.  Little  (1983)  has 
explored  the  relationship  between  the  Bayesian  weights  and  some  influence  measures  proposed 
for  the  linear  model  and  has  used  this  relationship  to  suggest  an  algorithm  to  determine 
which  of  the  weights  that  matter,  and  their  subsequent  computation.  Here  again,  however, 
the  computations  are  still  heavy. 

He  trill  show  in  this  section  how  we  can  confute  the  posterior  mean  in  a  simple  way 
using  the  robust  algorithm  we  have  suggested. 

The  regression  can  be  written  as  a  state  space  model  as 

yt  ■  *;  fit  *  s 

(4.9) 

fit  “  fit-1  “  fi 


where  yt  is  now  scalar  in  the  observation  equation  and  the  regression  parameter  vector  is 
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constant  over  tine.  Assuming  that  the  noise  has  a  scale  contaminated  normal 

distribution 


et  ~  (l-a)N(O ,o2)  aN(0,k2a2)  ,  k2  >  1  , 


(4.10) 


2  2 

for  fixed  values  of  a,  k  and  a  ,  we  then  have  a  particular  case  of  our  previous 


formulation.  In  practice,  however,  neither  of  these  three  parameters  are  known.  Although 
we  can  assume  a  and  k2  known  and  make  a  sensitivity  study  afterwards  (see  Box  and  Tiao 


(1968)),  the  problem  of  estimating  o  remains.  There  are  two  possible  solutions. 


The  first  is  to  compute  a  robust  estimator  of  o  and  use  this  value  in  the 


computations.  The  second  is  to  iterate  in  the  estimation  of  the  robust  value  until 


2  *2 

convergence.  Starting  with  a  preliminary  robust  estimate  of  a  ,  say  obtained  fro 


the  residuals  of  the  least  square  fit  that  leads  to  the  estimate  0 


(1)' 


we  compute  an 


estimate  using  the  robust  Kalman  filtering  algorithm  with  the  value  for  the 


variance.  Using  this  estimator  a  new  application  of  the  algorithm  is  made  which  provides  a 


new  estimator  3 


(3) 


and,  consequently,  a  new  set  of  residuals.  Prom  them  a  new  robust 


variance  estimator 


(3) 


is  built  and  the  process  is  repeated  until  convergence. 


In 

needed  are: 


this  particular  case,  writing  Vfc  «*  o\.  and  m^t  •  o2  *it  the  computations 


“It 

*2t 


“  (1  “  11  +h(i>) 


<k2  ♦  h(i)) 


(4.11) 

(4.12) 


r  .  a .  Ait  i  fit-i,2,  i  i  ,,i 1 

*t,1  -  1  ♦  <T=«>  /  —  2  U - - - )  (—  -  —  )t 

L  -2t  V)  "it  J 

fi(t)  ■  fi(t-D ♦  ^-1  ^r11  +  r^ht) 


(4.13) 


2t 


(4.14) 


It 


2t 


*(t >  “  yt  "  Xt  fi(t-l) 


V  -  V  x  b  x’  V 
t-1  t-1  «t  t  «t  t-1 


(4.15) 

(4.16) 


where  B„  -  b.  is  given  by 


(4.17) 


_!!lLL+Vt  r_1_  1  >2fe(t).2 

t  “  -  +  - - °t,1  t,2^' - T“J  - j  * 


i  in 

It  2t 


It  2t  (t) 


A  simple  alternative  for  a  robust  estimator  of  scale  is  the  median  absolute  deviation 


from  the  sample  median  y,  given  by 

MAD  •  median{|yt  -  y  | } 

This  estimator  is  known  to  be  more  efficient  for  the  contaminated  normal  distribution  than 

2 

the  sample  variance,  but  is  biased.  A  robust  unbiased  estimator  of  o  has  been  provided 


by  Tukey  et  al  (1977)  and  is  given  by 


o2  “  (.64)  1  mediant e  } 
i 


(4.18) 


where  the  ei  are  the  residuals  from  the  least  squares  fit.  The  final  covariance  matrix 

*2  •  • 

for  the  parameters  will  be  Vt  a  °r  Vt'  where  V  is  computed  using  (4.16). 

* 

Equation  (4.14)  shows  that  the  estimate  is  a  linear  combination  of  estimators 

2 

that  would  have  been  obtained  assuming  that  is  distributed  as  N(0,o  )  and 

2  2 

N(0,k  o  ),  respectively,  with  weights  that  are  the  posterior  probabilities  a.  's,  dis- 

t»l 

cussed  before.  As  for  the  variance,  instead  of  the  least  squares  expression: 


K'  ■  (*t-i +  «t  Kr' 


where  V^1  -  X'X,  now  equation  (4.11)  imposes  an  adaptive  formulation: 


K'  -  (\-i + 


(4.19) 


(4.20) 


where  at  »  bt/(1  -  h(i)bt)  depends  on  the  posterior  probabilities  of  the  observation 

being  an  outlier.  For  instance,  it  is  straightforward  to  see  that  if  a.  ,  -  1 ,  a.  »  1 , 

t,  l  * 

and  if  «  ,  -  1,  at  «  k“2. 

t,x  u 

This  algorithm  provides  a  useful  computational  device  to  estimate  £  when  the  set  of 

data  is  large.  However,  when  the  sample  is  small  there  are  two  problems  that  make  this 

approximation  a  crude  one.  First,  the  distribution  for  £  at  each  state  is  not  normal, 

2 

but  a  t-distribution  when  0  is  unknown.  Second,  with  a  small  set  of  data  the  ordering 
of  the  observations  can  influence  the  results.  These  two  problems  are  not  important  for  a 
large  data  set  because  (1)  the  t-distribution,  when  the  number  of  degrees  of  freedom  is 


large,  is  very  well  approximated  by  the  normal  and  (2)  the  order  of  the  observations  will 


a  *•••*  •  *  a  *  *  e  ’*  .  * 

•  s  *m  •  »  »  .  '  ■  ,  -  .  '  *  • 


not  affect  much  as  long  as  the  number  of  observations  is  large  and  stable  state  has  been 
reached. 

2 

The  above  is,  in  contrast  with  the  standard  Kalman  Filter  in  which  a  is  not  needed, 
while  here  the  variances  mit  are  required  to  compute  the  posterior  probabilities  that  an 
observation  has  been  spuriously  generated,  and  for  the  variance  of  the  estimation  itself. 


4.3.  Autoregressive  Time  Series  Estimation 

Assuming  in  the  previous  case  that  jc£  -  (yt_|, . . . ,yt-p) ,  the  model  reduces  to  the 
state  space  estimation  of  an  autoregressive  process.  Abraham  and  Box  (1979)  have  studied 
inference  in  this  type  of  model  when  there  is  a  Email  probability  that  "bad1*  observations 
occur.  The  exact  Bayesian  solution  is  again  difficult  to  compute  because  it  involves,  as 
in  the  regression  case,  the  computation  of  211  distributions. 

Although  in  this  case  the  previous  approach  can  be  UBed  to  obtain  a  robust  estimate  of 
the  parameters  when  all  the  data  has  been  collected,  the  algorithm  can  also  be  used  as  a 

robust  procedure  for  on-line  estimation  when  the  observations  are  received  sequentially. 

2 

In  this  latter  case  however,  a  way  to  compute  an  estimate  of  o  given  the  observed  data 
is  needed  because  at  every  instant  t  only  observations  y.|,...,yt,  will  be  available. 

To  solution  we  suggest  is  to  start  with  a  robust  estimate  of  the  variance,  and  update 

A 

this  estimation  when  the  new  residual  yt  -  yfc  is  computed.  The  algorithm  will  be  defined 
by  the  same  equations  but  now  (4.13)  and  (4.17)  are: 


with 


t1  + 

& 

t,r  i  ,(V*t  it-,.1,  . 

• 

exp  i  i x  )  ^  s 

®2t 

°t  “it 

.  ait 

+  ^21 

„  _  r  1  4  \2,®(t)> 

bt  -  — 

• 

"  “lt“2t(~  •  “)  - 

®it 

®2t 

“it  °2t  °t 

eu> 

-  yt  ■ 

'  *t1yt-1  ®tpyt-p 

(4.21) 


(4.22) 


(4.23) 


-  Hediante. _ . J/.64 


(4.24) 


t.  j  wri  "!F 


4.4.  Multivariate  estimation 

In  the  previous  examples  the  observed  vector  ^  was  a  scalar  but  the  algorithm  is 
also  very  simple  when  ^  is  a  vector.  As  an  example,  we  will  revise  only  the  multi¬ 
variate  regression  model.  The  standard  formulation  is 

*t  ■  ^  fit  ♦  £t 

(4.25! 

fit  •  Vi  “  * 

with 


s'  •••  a1' 

fi,t 

«t  * 

• 

- 

• 

a*  ■••••’  x; 

Kt 

where  x^  is  the  1  *  p  vector  of  explanatory  variables  and  git  is  the  p  x  1  vector 
of  parameters  linked  to  component  yit  of  yL •  The  noise  £t  is  assumed  to  have  a  mixed 
distribution  and  we  can  impose  different  structures  depending  on  the  particular  problem  on 
hand.  For  instance  if 


°tt  °U’**  °tk 


.°kr- 


kk 


(4.27) 


the  presence  of  a  multivariate  outlier  can  be  modelled  assuming  that  they  come  from  a 
distribution  with  covariance  matrix 


,  k*  >  1  '  (4.28) 

which  implies  that  the  components  of  an  outlier  are  unrelated.  This  is  referred  to  as 
external  structure  for  spurious  observations.  Another  example  is  C2  “  h2c1,  h2  >  1 ,  so 
that  the  components  of  outlying  observations  are  related  in  the  same  way  as  observations 
from  the  good  or  intended  source,  but  variances  of  outlying  components  are  larger.  This  is 
referred  to  as  internal  multivariate  structure  for  spurious  observations. 
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5.  Conclusions 


Real  data  sets  almost  always  contain  outlying  (extreme)  observations,  and  outliers  are 
particularly  damaging  in  on  line  control  situations  in  which  the  data  is  processed 
recursively.  Thus,  an  extremely  bad  value  can  distort  the  whole  mechanism  of  control  and 
make  the  process  very  unstable.  In  Industrial  practice,  for  instance,  all  types  of  ad  hoc 
procedures  have  been  developed  to  cope  with  this  situation  (see  Mah  and  Tamhane  (1982)  and 
Crowe  et  al  (1983)),  but  a  general  methodology  is  needed. 

In  this  paper,  we  offer  a  relatively  simple  model  and  obtain  a  procedure  to  deal  with 
the  above  problem.  To  represent  the  appearance  of  bad  observations,  a  scale  contaminated 
normal  distribution  has  been  assumed  for  the  measurement  error.  We  have  chosen  this  model 
because  other  authors  have  demonstrated  that  its  use  provides  sensible  solutions  to  other 
statistical  problems,  for  example,  in  linear  model  estimation. 

In  fact,  we  have  shown  in  this  paper,  how  a  Bayesian  approach  allows  the  development 
of  a  simple  recursive  estimation  algorithm  that  has  the  desired  property  of  "filtering*  bad 
(i.e.,  extreme)  observations.  Indeed,  extreme  values  are  downweighted  by  their  posterior 
probability  of  being  spurious,  and  the  estimates  of  the  parameters  are  updated, 
recursively,  accordingly. 

Finally,  we  apply  our  model  to  the  case  of  exponential  smoothing  with  contaminated 
error,  and  show  that  the  parameter  estimates  obtained  from  the  resulting  algorithm  are  a 
weighted  combination  of  certain  2n  smoothing  schemes.  The  application  of  this  procedure 
to  a  broad  range  of  statistical  estimation  problems  is  briefly  discussed. 
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Appendix  I 


As  mentioned  in  Section  2,  we  wish,  in  the  Appendix,  to  prove  the  results  (2.12).  We 

have  that  the  prior  for  is  Nfl  Uit.t_1»vt1t-1>'  where  **tst-1  and  Vtit-1  *re 

~t 

defined  at  (2.4a),  and  from  (2.1),  we  have  that  the  density  of  ^  is  Ny  (At  9t»C). 
Hence,  the  posterior  of  given  yt,  is  such  that 

p^lSt)  «  exP  -  1  {i£t  •  Ht.t-11'  vt!t-i[St  ■  *t:t-iJ 

(AI.1) 

♦  &t  -  \  £t]‘  C1^  -  At  et)}  . 

Now  the  expression  in  the  braces  of  the  exponent  in  A1.1  is  easily  seen  to  be 

s;ivt!t-i +  A;c"lAt]£t  -  2^iv;lt-i  Ht,t-i +  a;  c'V  +  con,t-(1)  <ai-2) 

which  in  turn,  on  completing  the  square,  has  the  form 

[4  ‘  Vt:t  St1*  Vt!t  l$t  ‘  Vt:t  K]  +  conBt*(1>  ‘  const.,2)  (A1.3) 

where  const*  3)  are  functions  of  At,  C,  Vt Jt_1  and  all  of  which  are  assumed 


(AI.2) 


(A1.3) 


known.  Also,  we  have  written  V  ^  for  the  matrix  of  the  quadratic  form  in  (AI.2)  and 


(AI.3),  that  is 


V"1.  “  V"1.  ,  +  A •  C"1  A. 
tit  tit-1  t  t 


and  we  have,  clearly,  that  bfc  is  given  by 

b  -  V-V  ,  u  _  ,  +  A’  C-1  v  . 

~t  tit-1  *tit-1  t  *t 

Hence,  p(2t|j|t.)  is  such  that 

-  7«t  -  vt:t  41’  -  vtit  V 

which  is  to  say,  that  a  posteriori,  0.  ~  N(U  ,V  ),  where 

’Uw  v!  t  t!  t 

u  -  V  b 
*tit  tit  ~t 


(AI.4) 


(AI.4a) 


(AI.5) 


(AI.6) 


and  Vt.t  is  such  that  its  inverse  is  as  defined  in  (AI.4).  It  remains  to  show  that  Vt.t 
is  given  by  (2.12c)  and  that  (AI.6)  may  be  rewritten  as  in  (2.12a).  For  the  latter,  we 


have,  from  (AI.6),  on  using  (AI.4a),  that 

i*  -  V  [v'1 


^lt  “  Vt!t  Vt!t-1  ^tit-1 


+  A’  c"1 


(AI.7) 


*t:t  ■  Vtitl(Vt!t-1  +  K  ^V^tit-I  +  V'V  -  At  «t:t-1)3  (AI-7*> 


and  using  (AI.4)  we  have 


JBt.t  '  Vt,t[Vt.t  4,t-1  +  A;  c  (*t  -  At  *t:t-1,J 


so  that 


*t,t  '  *t:t-1  *  Vt!t  At  C  <*t  •  At 
To  show  the  equivalence  of  (AX. 7c)  with  (2.12a),  we  must  show  that 

Vt s t^tc  1  "  ''tit-l*^*^  1 


where  is  as  defined  at  (2.11a).  But 


vtit  *  Vtlt  Vt;t_, 

and  again  using  (AX.4),  we  have 

Vt:t  -  vt.t[v;!t  -  A£  C'1 


or 


which  is  to  say 


Vtit  "  vtit-1  "  vt:t  *t  C  1  *t  Vt:t-1 


Vt:t  +  Vt.t  *t  C"1  »t  -  Vt!t_, 


Post  multiplying  by  a£  on  both  sides  of  (AI.9c),  we  may  write 

vt,t  *t  c_1c  ♦  vt.t  K  i’Vt.t-iAi  *  vt s t- iAt 


and  from  (2.11a) 


vt:t  K  c',<c  +  Vtit-lV  -  Vt,t-1  K 


Vtit  *t  c~’  -  Vtit-1  *i  <* 


and  (AI.8) ,  and  hence  (2.12a),  is  now  demonstrated. 

Finally,  we  wish  to  derive  the  result  (2.12c).  He  have  from  (2.12b)  that 

v"1.  »  V”1.  ,  (I  +  V  A*  C-1  A  ]  . 

tit  tit-1  t:t-1  t  t 

Hence 

vt,t  ■  [I  +  vt,t-1  K  c’’  V1vt:t-1 

and  using  a  well  known  identity  for  inverses  of  matrices  of  he  form  X  +  EF,  viz 

(I+EF)*1  -  I  -  E(I+FE)_1F 

with  E  identified  as  Vfcst_1  a£  C-1  and  F  as  At,  we  find 

Vt:t  “  {I  “  Vtit-1  At  C"1[I  +  AtVt:t-1AtC~1,"lAt}Vtit-1 

so  that 


(AX. 7b) 


(AX. 7c) 


(AI.8) 


(AI.9) 


(AX. 9a) 


(AX. 9b) 


(AX. 9c) 


( AI.9d) 


(AI.9e) 


(AX.9f ) 


(AI.10) 


(AI. 10a) 


(AX. 11) 


(AI.12) 
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Appendix  IZ 

Thia  appendix  is  devoted  to  the  derivation  of  results  (3.6),  (3.7)  and  (3.6). 
He  have  from  (3.S),  with  f  given  by  (3.1)  and  p  given  by  (2.3),  that  the 


predictive  distribution  of  given  ^ ,  ie  such  that 

2 

hixtlxt-i*  *  l  ai  Je  f,*tlAt  Afci>f<atl«t,t-i'vt,t-i>dSt 

i”  1  ~t 

where  the  density  f  is  defined  in  (3.7d).  But  the  integral  operation  is  clearly 


equivalent  to  the  arguawnt  used  in  (2.10)  with  C±  replacing  C  so  that  f roe  (2.10) 
(2.11)  we  have 


h()ttl*t.1)  -  £  V(*JAt  J^t-i'Vi*  <AI1*2) 

where  is  as  defined  in  (3.6a)  -  see  (2.11a),  and  the  result  (3.6)  is  new  proved. 

To  derive  the  results  (3.7),  we  note  that  (3.4)  gives  the  prior  for  £t,  while  the 
distribution  specified  by  (3.1)  dictates  the  likelihood  of  given  6^.  Hence  the 

posterior  of  given  ^  auch  that 

2 

p(JLlx»)  *  i  ajftxJft*  fi>,C,)f(fik|jBfc.._,,Vfc.fc_,)  .  (All. 3) 


p^l^t1  "  ®lf(*tlAt  ®t'cl>f(®tl,Jtlt-1'vtlt-1> 


In  common  to  the  work  of  Appendix  I,  the  sueeand  of  (All. 3)  requires  the  "completion  of  a 
square"  operation  to  fore  a  quadratic  fore  in  9.  .  However,  since  the  "constants"  left 
over  in  this  operation  depend  on  "i",  they  cannot  be  absorbed  by  the  constant  of 


proportionality.  Now  for  the  1th  tern,  we  have  in  the  exponent,  apart  f roe  the  -  -j. 


%  -  At  v'lV  -  At  v  ♦  <St  ■  *t.t-i,,v;!t-i(St  -  *t,t-i»  (wi-4> 


and  from  the  work  in  Appendix  I,  we  see  that  (All. 4)  nay  be  written  as 

[£t  ■  vt,l  '’l2t  ■  vut  4i,]  +  K  c'i  Xt 


(All. 5) 


♦u.  V-1  u  -  b(i)'  V(i)  b(i) 

*tit-1  tst-1  *tit-1  ~t  tit  ~t 


rn  (V-1  +  A*  c"1  A  )"' 

vtlt  lvtlt-1  At  ci  At* 


(All. 6) 
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(All. 13) 
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•o  that  using  this  In  (All. 12)  we  now  have 


-  jj  *tll 


with  a.  .  as  advertised  in  (3.7c). 

W  * 


Now  using  (All.  14) ,  we  can  calculate  aomcnts.  Me  have 


KflJV  -  I  a  I 

i“1 

-  I  «  «“> 

if1  t,i  *t«t 


and  using  (All .11a),  which  is  the  result  (3.7a),  we  have 


*(Stixt»  ■  ^t,t-i  ♦  ]  at,ivt«t-iA;Mt!i(*t  -  \  Bt.t-i1 


which  swy  be  expressed  as  in  (3.8),  or  as  above  in  (All. 15). 

To  find  the  variance  -  covariance,  we  first  determine  *(£t  J>£lxt)  and  then 
identity 

vtejxt)  -  B(et  s;|Xt)  -  I*cet|xt.)]  [K(etlxt>r  . 

Now  from  (All. 14),  we  have 
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i-1 
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!  a  IV11’  .u111  ull,,) 
4“  t,il  tit  *tst  *tJt  1 


Jl) 


Now  V'  '  is  given  in  (All. 6)  so  that  we  have 
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(All. 16) 


use  the 
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